Scalability of spin FPGA: A Reconfigurable Architecture based on spin MOSFET 
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Scalability of Field Programmable Gate Array (FPGA) using spin MOSFET (spin FPGA) with 
magnetocurrent (MC) ratio in the range of 100% to 1000% is discussed for the first time. Area and 
speed of million-gate spin FPGA are numerically benchmarked with CMOS FPGA for 22nm, 32nm 
and 45nm technologies including 20% transistor size variation. We show that area is reduced and 
speed is increased in spin FPGA owing to the nonvolatile memory function of spin MOSFET. 

PACS numbers: 



INTRODUCTION 



Spin metal-oxide-semiconductor field-effect transistor 
(spin MOSFET) is a novel MOSFET whose source and 
drain are contacted with ferromagnetic materials [ij. 
Ferromagnetic materials provide stable and robust non- 
volatile memory [2j. Fig. 1(a) shows a spin MOSFET in 
which the write process is carried out by using magnetic 
tunneling junction (MTJ) 0,11]. Spin MOSFET directly 
couples logic element with nonvolatile memory element, 
opening up a path to a new style of logic-in-memory ar- 
chitecture [5|. 

Field Programmable Gate Array (FPGA) has a great 
advantage in that a chip is completely programmable and 
reconfigurable. However, conventional FPGA includes 
a lot of static random access memory (SRAM), which 
is a volatile memory composed of six transistors and 
faces the fabrication limitation of Si MOSFET. Thus, 
new FPGA based on novel devices has been expected. 
Here, for the first time, we report on numerical bench- 
mark for an island-style FPGA using 22nm, 32nm and 
45nm spin MOSFETs (spin FPGA) [J] by improving 
standard benchmark tools [6]. Compared with other 
proposals 0,(1], spin FPGA has an advantage in that it is 
based on Si transistor equipping stable nonvolatile mag- 
netic memory. Moreover, SRAM (six transistors) can be 
replaced by one spin MOSFET. Many SRAMs are used 
in FPGA such as in Lookup tables (LUTs) and inter- 
connect area of pass transistors. Therefore, this replace- 
ment reduces transistors and FPGA area. Because the 
speed of FPGA is governed by the length of wire part, 
smaller area of spin FPGA leads to faster performance. 
Monte Carlo simulation based on the Predictive Tech- 
nology Model [t| is carried out to consider variation of 
device size assuming fabrication difficulties. Although 
experiments on MTJ 0] at present show the maximum 
magnetocurrent (MC) ratio is 260% (RA « 10fi/im 2 ), in 
this paper we treat 100% < MC ratio < 1000% assuming 
future realization of larger MC. 
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FIG. 1: (a) Spin-based MOSFET in the type of "Spin- 
transfer- Torque-Switching MOSFET" in which magnetic tun- 
nel junction (MTJ) are attached to one of the electrodes, 
(b) IcL-Vg characteristics for parallel and antiparallel states 
(100% < MC < 1000%) based on PTM SPICE model (see 
text). 



SPIN FPGA 

Spin MOSFET.— We model the spin MOSFET by 
changing SPICE parameter (mobility) such that MC de- 
fined by MC = (Ip — Iap)/Iap coincide with a given MC 
ratio (ip and Tap are parallel and antiparallel currents, 
respectively.) For Ip, we use the same SPICE param- 
eters as those of the conventional MOSFET (Fig. 1(b)). 
Although there is extra resistance owing to the existence 
of MTJ in spin MOSFET, as Ref.[Io| reported, the re- 
sistance of 50 nm square MTJ can be controlled to less 
than 40011 and this resistance is negligible compared to 
the resistance of conventional MOSFET of the order of 
10 kQ. 

Spin Cluster Logic Block. — Fig. [2] shows our spin LUT 
structure for 4-inputs and 1-output, which is a typ- 
ical set of LUT parameters 0]. Transistor sizes of am- 
plifiers are adjusted such that the input pulse signal is 
appropriately transferred to the output of LUT. 

Pass transistor. — We propose a spin control pass tran- 
sistor depicted in Fig. [3] (a). SPICE simulations show 
that the speed of pass transistor in Fig. [3]Ja) is of the 
same order as that in Fig. [3]^b) by adjusting the width of 
control transistors (total transistor area of Fig|3{a) is four 
in unit of minimum transistor size). Although this pass 
transistor structure has a disadvantage, namely, a leak- 
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FIG. 2: Schematic of a 4-input look up table based on spin 
MOSFET (spin LUT). Spin MOSFETs replaces SRAMs at 
the leftmost part of this figure. 




FIG. 3: (a) New routing pass transistor using spin MOSFET 
and (b) that using conventional SRAM. In (a), one extra tran- 
sistor is required to change P/AP state of spin MOSFET, 
and the width of spin MOSFET and PMOS is enlarged to 
control ON/OFF state of attached pass transistor. The esti- 
mated number of required control transistors in (a) is four in 
minimum- width transistor area model [(|. 



age pass from p-type transistor (PMOS) to n-type tran- 
sistors (NMOS), this power dissipation can be reduced 
by limiting the on-state only when it is required [l2j ]. 



FPGA AREA REDUCTION BY SPIN MOSFET 

First, let us compare the number of transistors in spin 
LUT and CMOS LUT. In ref. [H|, we only counted the 
number of transistor of a spin LUT. Here, we estimate 
the number of transistors by a general clustered logic 
block (CLB) in which four CLBs are clustered with 10 
inputs and 4 outputs. For if-input LUT, 2 K SRAM and 
2 K+1 — 2 pass transistors (multiplexer trees) are required 
with three input buffers. Then the total number of tran- 
sistors in a complementary MOS (CMOS) LUT N^™ os) 
is given by 2 K + 3 - 2 + 6K. In a spin LUT (Fig©, 
the leftmost SRAMs are replaced by spin MOSFETs 
with an additional write/erase transistor. In addition, 
a sense amplifier (five transistors), a reference transistor 
and two power supply transistors are required. Thus, 
the number of transistor required in the spin LUT is 
given by iv/ u s t pin) = 3 x 2 K + 6(K + 1). Thus, we have 
(spin) -■ 5x2 K -8. For example, 4-input LUT 



eludes 78 transistors (48% reduction). 

Circuit area is calculated by the minimum- width tran- 
sistor area model [|| , in which each transistor area is esti- 
mated by a unit of minimum-width NMOS. When Wmi n 
and S'min are width and area of minimum NMOS, respec- 
tively, a width ZW m - m transistor is estimated as having 
an area of (1 + Z)S mXn /2. Width of PMOS is determined 
such that an inverter changes at half of a drain voltage. 
For PMOSs of 22nm, 32nm and 45nm nodes, 



7 (pmos) _ -. f-o 7 (pmos) 
Z/ 22nm — i-.OO, ^32 nm 



2.22, Z 



(pmos) 
45nm 



2.57 (1) 



(PMOS is scaled down more than NMOS because of ad- 
vanced technologies such as strain effects.) Area of re- 
cent FPGA is mostly occupied by an interconnect or 
wiring part. Wire resistance and capacitance are cal- 
culated from Ref. 13 1. 
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conventionally has 150 transistors whereas spin LUT in- 



BENCHMARK RESULTS AND DISCUSSION 

Area and speed of spin FPGA over 20 typical million- 
gate circuits are benchmarked with modified VPR 
ver.5 @ for 22nm, 32nm and 45nm transistors. We take 
standard parameters such as F s — 3 (Wilton switch box), 
.Fc-in = 1.0 and F c _out = 0.25 with length 1 wire seg- 
ment @. Fig|4]|6]show the average results over 200 Monte 
Carlo simulations for up to 20% (3 sigma) variations of 
length and width in 22 nm transistors, where the vertical 
axes show advantage of area, critical path delay and area- 
delay product defined by (0 cmos -© s P in )/e spin for 0={A 
(area), idciay (critical path delay), A x idoiay (area-delay 
product)}. Area-delay product is treated as a metric of 
FPGA performance. FigJH and Table I show that area 
of spin FPGA is greatly reduced compared with CMOS 
FPGA. For 22 nm transistor, an average of 16% area 
reduction is realized. This area reduction leads to small 
critical path delay of circuits resulting in faster operation 
in spin FPGA. In Fig. [5] speed is improved by an aver- 
age of 24%. As MC ratio increases, P/AP signals that 
go into an amplifier in spin LUT (Fig. [2]) become clearer. 
This leads to more robust operation against the variation 
of transistors, resulting in shorter delay in Fig. [S] Thus, 
area-delay product is improved on average by 43%. Fig [7] 
shows summarized results of benchmark from 22 nm to 
45 nm transistors. As mentioned above, as transistor 
scale decreases, ratio of PMOS area to NMOS area de- 
creases. This means that the effect of area reduction by 
spin MOSFET (NMOS) becomes larger resulting in bet- 
ter performance of small transistor nodes. 

One of the advantages of spin MOSFET compared with 
CMOS with interlayer MR AM system is that, for spin 
MOSFET, MC ratio change directly affects subthreshold 
region of MOSFET which leads to more efficient device 
operations. The effect of direct injection of spin into 
channel on device performance will be clarified in more 
detail in the near future. 
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FIG. 4: Benchmark calculation of the advantage of spin 
FPGA to CMOS FPGA over 20 circuits (area). Rightmost 
data shows average over the 20 circuits. 



22nm transistors 
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FIG. 5: Benchmark calculation of the advantage of spin 
FPGA to CMOS FPGA over 20 circuits (delay). Mean crit- 
ical path delay of CMOS FPGA is 30.8 ns and those of spin 
FPGA are 25.4 ns (MC=100%), 25.5ns (MC=200%), 25.2 ns 
(MC=400%), 24.5ns (MC=600%) and 24.5ns (MC=1000%). 



CONCLUSION 

Spin FPGA was numerically benchmarked for 22nm, 
32nm and 45nm transistors. We showed that the perfor- 
mance of spin FPGA becomes superior to that of con- 
ventional CMOS FPGA as transistor size decreases and 
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FIG. 6: Benchmark calculation of the advantage of spin 
FPGA to CMOS FPGA over 20 circuits (area-delay product). 




FIG. 7: Comparison of transistor generation. An average 
result of the benchmark calculation as a function of MC ratio. 
Relations between generations are considered to be related 
with relative PMOS areas (see Eq.(l) and text). 



MC ratio increases. 



TABLE I: Area of a single CLB and interconnect. Result of 
interconnect is taken from Fig. fj] 
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